dates = 
  census %>%
  mutate(Month = substr(date, 1,2), Day = substr(date, 3,4), Year = substr(date,5,9))

skimr::skim_without_charts(dates)
Data summary
Name dates
Number of rows 3023
Number of columns 19
_______________________
Column type frequency:
character 16
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
unique_squirrel_id 0 1.00 13 14 0 3018 0
hectare 0 1.00 3 3 0 339 0
shift 0 1.00 2 2 0 2 0
date 0 1.00 8 8 0 11 0
age 121 0.96 1 8 0 3 0
primary_fur_color 55 0.98 4 8 0 3 0
highlight_fur_color 1086 0.64 4 22 0 10 0
combination_of_primary_and_highlight_color 0 1.00 1 27 0 22 0
location 64 0.98 12 12 0 2 0
lat_long 0 1.00 38 45 0 3023 0
activity 200 0.93 6 8 0 5 0
reaction 780 0.74 9 11 0 3 0
sounds 2884 0.05 4 5 0 3 0
Month 0 1.00 2 2 0 1 0
Day 0 1.00 2 2 0 11 0
Year 0 1.00 4 4 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
hectare_squirrel_number 0 1 4.12 3.10 1.00 2.00 3.00 6.00 23.00
long 0 1 -73.97 0.01 -73.98 -73.97 -73.97 -73.96 -73.95
lat 0 1 40.78 0.01 40.76 40.77 40.78 40.79 40.80

Frequency

In this part, we are interested in looking at the central park squirrel census collected in October 2018. We make this graph to analyze the frequency of squirrels recorded for each of the observation (AM or PM).

dates %>% 
  group_by(Day, shift) %>%
  count() %>%
  ggplot() +
  geom_col(aes(x = Day, y = n, fill = shift)) +
  scale_fill_brewer(palette = "Paired") +
  labs(x = "Day", y = "Number of Observations", fill = 'Shift') +
  ggtitle('Central Park Squirrels Distribution by Days (AM/PM)') +
  theme(plot.title = element_text(hjust = 0.5), panel.grid.minor = element_blank(),panel.grid.major = element_blank())+
  labs(fill='Time of day') 

The first graph we drew was ‘Number of Observations’ v.s. ‘Time of Day’, and morning and afternoon data were separated and found out that squirrels tended to be more active in the afternoon or at night time. However, the limitation of the data was that we were not able to get the exact time period of their activities but only either morning or evening, we can assume they are present prior to sunset since they should be busy collecting the food when there is sunlight.

Distribution by Primary Fur Colors

In this part, we make this graph to analyze the frequency of squirrels recorded for each of the observation by their primary fur color.

dates %>%
  group_by(Day, primary_fur_color) %>% 
  count() %>%
  ggplot() +
  geom_col(aes(x = Day, y = n, fill = primary_fur_color)) +
  scale_fill_brewer(palette = "Paired") +
  ggtitle('Central Park Squirrels Distribution by Primary Fur Color') +
  theme(plot.title = element_text(hjust = 0.5), panel.grid.minor = element_blank(),panel.grid.major = element_blank()) +
  labs( x='Day', y= 'Number of Observations') +
  labs(fill='Primary Fur Color') +
  scale_fill_manual(values = c("#000000", "#D2691E", "#D3D3D3", "white"))

The second graph we drew was ‘Number of Observations’ v.s. ‘Primary Fur Color’, it’s clearly shown that different number of observations were made in different days and there is no clear pattern. Squirrels were observed to be the most active on Oct.7 and Oct.13, and they clearly became less active in last few days. Generally, the gray squirrels were the most massive and black ones were the fewest. The color of cinnamon was also pretty frequently observed with some color-not-identified ones.

Distribution by Age Group

pie_1 = 
  dates %>% 
  filter(age !='?') %>% 
  filter(!is.na(age)) 


fig_1 = 
  plot_ly(data = count(pie_1, age), labels = ~age, values = ~n, type = 'pie', insidetextfont = list(color = '#FFFFFF')) %>% 
  layout(title = 'Central Park Squirrels Distribution by Age Group',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
  
fig_1

The third graph we drew was a pie chart indicating the distribution of squirrels by their physiological age. The majority (88.6%) of the them was adult while the remaining 11.4% was juvenile. It’s not sure how their age stage was determined by the observers, maybe by their sizes. The limitation was that only ‘adult’ and ‘juvenile’ were categorized, but the predictions might be more valid if other stages like ‘baby’ or ‘old’ were provided.

Adult Group

pie_2 = 
  dates %>% 
  filter(age == "Adult") %>% 
  filter(!is.na(primary_fur_color))

fig_2 = 
  plot_ly(data = count(pie_2, primary_fur_color), labels = ~primary_fur_color, values = ~n, type = 'pie', insidetextfont = list(color = '#FFFFFF')) %>% 
  layout(title = 'Squirrels Fur Color Distribution (Adult)',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
  
fig_2

The fourth graph we drew was to show the distribution of only adult squirrels by their primary fur color. The majority (83.6%) of the adult squirrels were gray. 12.8% of them were cinnamon, and the rest 3.62% were color of black.

Juvenile Group

pie_3  =
  dates %>% 
  filter(age == "Juvenile") %>%
  filter(!is.na(primary_fur_color))

fig_3 = 
  plot_ly(data = count(pie_3, primary_fur_color), labels = ~primary_fur_color, values = ~n, type = 'pie', insidetextfont = list(color = '#FFFFFF')) %>% 
  layout(title = 'Squirrels Fur Color Distribution (Juvenile)',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
  
fig_3

The fifth graph we drew was to show the distribution of only juvenile squirrels by their primary fur color. The distribution was similar as the adult ones. 79.5% of the juvenile squirrels were gray. 18% of them were cinnamon, and the rest 2.5% were black.

Squirrel Activity by Primary Fur Color

In our dataset, there are 5 different kinds of activities reported which are foraging, running, eating, climbing and chasing. We make this plot to analyze central park squirrel activity by their primary fur color.

plot_2 = 
  dates %>%
  filter(!is.na(activity)) %>% 
  filter(!is.na(primary_fur_color)) %>% 
  group_by(primary_fur_color) %>%
  count(activity, sort = TRUE)%>%
  ggplot(aes(x = reorder(activity, n), y = n)) + 
  geom_bar(aes(fill = primary_fur_color), stat = "identity") + 
  scale_fill_manual(values = c("#000000", "#D2691E", "#D3D3D3", "white")) +
  theme_classic() + 
  facet_wrap(~primary_fur_color, nrow = 1) +
  labs(title = "Squirrel Activity by Primary Fur Color", y = 'Number of Observations', x = 'Activity') +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

plot_2

The sixth graph we drew was to show the activities in squirrels by their different primary fur colors. No matter of the fur colors, they tended to forage the most frequently and chase the least frequently, which makes sense because squirrels needed to store foods during cold months.

Squirrel Activity by Location

We make this plot to analyze central park squirrel activity by their location (above ground or ground plane).

dates %>%
  filter(!is.na(activity)) %>% 
  filter(!is.na(location)) %>% 
  group_by(activity) %>%
  mutate(n=1) %>%
  ggplot() +
  geom_col(aes(y=n,x = activity, fill = location), position="fill") +
  ggtitle('Central Park Squirrels Acitivities by Location') +
  theme(plot.title = element_text(hjust = 0.5), panel.grid.minor = element_blank(),panel.grid.major = element_blank())+ labs( x = 'Activity', y= 'Proportion') +
  labs(fill = 'Location') 

The last graph we drew was to show how distributions of different activities differ by the locations. For example, climbing happened above ground most of the time, but foraging, running and eating basically happened on ground plane. Activities like chasing has equal probabilities of happening both above ground and on ground plane.

Central Park Squirrel Overall Map

We build this overall map to visualize the distribution of each squirrel in central park by their different primary fur color clearly.

pal_coats <- colorFactor(c("#000000", "#D2691E", "#D3D3D3", "white"), domain = c("Black", "Cinnamon", "Grey", "NA"))

#Locations based on fur color
census %>%
  filter(!is.na(primary_fur_color)) %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(lng = ~long,
                   lat = ~lat, radius = 3, color = ~pal_coats(primary_fur_color), stroke = FALSE, fillOpacity = 0.5) %>%
  addLegend(position = "topright",pal = pal_coats, values = ~primary_fur_color)

If you would like to track squirrels by their several characteristics (fur color, age, time of day, interaction and activity), we have built an AWESOME interactive app, named “Central Park Squirrel Tracker”.